Neural Network-based Language Model for Conversational Telephone Speech Recognition

نویسنده

Graeme W. Blackwood

چکیده

Preface This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. I hereby declare that my thesis does not exceed the limit of length prescribed in the Special Regulations of the M. Phil. examination for which I am a candidate. The length of my thesis is 14980 words. Acknowledgements I would like to thank Professor Phil Woodland for his help and guidance over the course of this project. I would also like to thank David Mrva for providing a customized version of the LPlex tool capable of summing over words in a specified shortlist. Abstract This thesis presents a large scale neural network language model for telephone conversation transcriptions. By mapping n-gram contexts to a continuous vector space, the neural network is trained with softmax normalization to operate as a probability estimator. The smooth nature of the resulting distributions achieves consistently reduced perplexity for restricted subsets of the vocabulary. Excessive training time is a major issue and optimized linear algebra libraries are used for an efficient implementation of feed forward and back propagation during training. A word-class interpretation of the network inputs and outputs is demonstrated to obtain improved perplexity over the n-gram model when training data is limited.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving English Conversational Telephone Speech Recognition

The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 ho...

متن کامل

The IBM 2015 English conversational telephone speech recognition system

We describe the latest improvements to the IBM English conversational telephone speech recognition system. Some of the techniques that were found beneficial are: maxout networks with annealed dropout rates; networks with a very large number of outputs trained on 2000 hours of data; joint modeling of partially unfolded recurrent neural networks and convolutional nets by combining the bottleneck ...

متن کامل

Lexicon-Free Conversational Speech Recognition with Neural Networks

We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding ...

متن کامل

The IBM 2016 English Conversational Telephone Speech Recognition System

We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bi...

متن کامل

Conversational telephone speech recognition

This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the aco...

متن کامل

Hierarchies of neural networks for connectionist speech recognition

We present a principled framework for context-dependent hierarchical connectionist HMM speech recognition. Based on a divideand-conquer strategy, our approach uses an Agglomerative Clustering algorithm based on Information Divergence (ACID) to automatically design a soft classi er tree for an arbitrary large number of HMM states. Nodes in the classi er tree are instantiated with small estimator...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Neural Network-based Language Model for Conversational Telephone Speech Recognition

نویسنده

چکیده

منابع مشابه

Improving English Conversational Telephone Speech Recognition

The IBM 2015 English conversational telephone speech recognition system

Lexicon-Free Conversational Speech Recognition with Neural Networks

The IBM 2016 English Conversational Telephone Speech Recognition System

Conversational telephone speech recognition

Hierarchies of neural networks for connectionist speech recognition

عنوان ژورنال:

اشتراک گذاری